Andres Karjus
Centre for Language Evolution, University of Edinburgh
& University of Tartu
a.karjus@sms.ed.ac.uk | andreskarjus.github.io | @AndresKarjus
Why and how does language (~culture in general) change over time?
Generate a “topic” for each target word, consisting of m topic words, based on co-occurrence
..........the question. Does an iced latte count as a dairy product ?
..social correctness , cappuccino , latte , microbrewed beers . Live
...............be spitting in Ross ' latte when he 's not looking
.....Seattle are sipping decaf mocha latte nectar in a local cafe
...Advection: a measure of how much these topic words have changed on average (weighted by some association score) between two periods.
word cappuccino libel espresso resolutely vibe nectar iced scald ...
log change +0.17 -1.40 +0.45 +0.12 +0.67 -0.41 -0.12 +0.07 ...
PPMI 11.51 10.3 10.3 9.25 9.05 8.9 8.89 8.72 ...+1.19 log frequency change 1990s->2000s (1.91pmw->2.18pmw)
+0.07 advection (weighted mean log frequency change in topic words)
-Top positive residuals (~selection): celery root, sauerkraut, baking powder, granulated sugar, pork fat, corn starch
-Top negative: powder sugar, powdered loaf sugar, pearl ash (potassium carbonate), sauce, lemon juice, tomata
Red: new ingredients (size = log frequency increase). Gray: old ingredients. Links: width indicates cosine similarity.
degree(new ingredients) ~ advection(new ingr); R^2=0.31
degree(new ingredients) ~ advection(new ingr) * mean(degrees(old ingr neighbors)); R^2=0.4
The advection value of a word in time period \(t\) is defined as the weighted mean of the changes in frequencies (compared to the previous period) of those associated words. More precisely, the topical advection value for a word \(\omega\) at time period \(t\) is
\[\begin{equation} {\rm advection}(\omega;t) := {\rm weightedMean}\big( \{ {\rm logChange}(N_i;t) \mid i=1,...m \}, \, W \big) \end{equation}\]where \(N\) is the set of \(m\) words associated with the target at time \(t\) and \(W\) is the set of weights (to be defined below) corresponding to those words. The weighted mean is simply
\[\begin{equation} {\rm weightedMean}(X, W) := \frac{\sum x_i w_i }{\sum w_i} \end{equation}\]where \(x_i\) and \(w_i\) are the \(i^{\rm th}\) elements of the sets \(X\) and \(W\) respectively. The log change for period \(t\) for each of the associated words \(\omega'\) is given by the change in the logarithm of its frequencies from the previous to the current period. That is,
\[\begin{equation} {\rm logChange}(\omega';t) := \log[f(\omega';t)+1] - \log[f(\omega';t-1)+1] \end{equation}\]where \(f(\omega';t)\) is the number of occurrences of word \(\omega'\) in the time period \(t\). Note we add \(1\) to these frequency counts, to avoid \(\log(0)\) appearing in the expression.
Based on ingredient co-occurrence in the cookbooks. Displaying only the igredients with at least one neighbor >0.6 similarity (corresponds to edge width) and excluding nodes with weaker similarity links. Color corresponds to log frequency change (red=increase).
*This research was supported by the scholarship program Kristjan Jaak, funded and managed by the Archimedes Foundation in collaboration with the Ministry of Education and Research of Estonia.